Building up the case for time-dependant visualizations

The problem statement

These examples are re-used from section 2.6.5 of https://ggplot2-book.org/getting-started#sec-line.

The dataset called economics from the ggplot2 package, has economic data on the US measured over the last 40 years up until 2015.

Here is a brief look at the first 5 out of 574 rows of the dataframe economics.

data <- head(economics, n=5)
knitr::kable(data)
date pce pop psavert uempmed unemploy
1967-07-01 506.7 198712 12.6 4.5 2944
1967-08-01 509.8 198911 12.6 4.7 2945
1967-09-01 515.6 199113 11.9 4.6 2958
1967-10-01 512.2 199311 12.9 4.9 3143
1967-11-01 517.4 199498 12.8 4.7 3066

Let’s first make a simple time series plot of the unemployment rate. This is a continuous variable that is computed with the ratio unemploy / pop.

In ggplot2 a frame defines the first mapping from variables to a space where the data will be represented. It is created with the function aes(). The obvious frame for this plot is defined by the two variables date and unemploy / pop. They are mapped to the x and y coordinates of a 2-D plane. The glyphs drawn over this frame will be lines between the data points located in the frame, they are created with the function geom_line(). This function defines a layer over the frame.

ggplot(data = economics, mapping = aes(x = date, y = unemploy / pop)) +
  geom_line()

Technically speaking unemploy / pop represents the “population rate of unemployment as a fraction of the population able to work that is unemployed”, (https://www.bls.gov/cps/cps_htgm.htm#definitions)

Another variable called uempmed from the same dataset tracks the median length of time in weeks of unemployment.

ggplot(economics, aes(date, uempmed)) +
  geom_line()

From these two plots one can observe the recent trend towards longer median unemployment time in the decade of 2010. There are also cycles of between 5 and 10 years of peak unemployment rates.

An interesting question is how these two time series correlate over time. In ggplot2, the frame for this new representation can be defined by a mapping of each variable to the x and y coordinates of the plane. The glyphs are of two kinds, the variables are represented with the layer geom_point, while their sequential trajectory, ordered by time, is captured by the layer geom_path. The figure below shows such a graph.

ggplot(economics, aes(unemploy / pop, uempmed)) + 
  geom_path() +
  geom_point()

It is hard to understand the direction of time from the lines alone. For example, it is difficult to visualize where the first, the last, or any years in between have happened.

This can be addressed by adding a mapping from the property colour to the variable year in the layer geom_point. R uses a default colour scale to assign specific colours from a colour palette to years.
The ggplot2 package defines the function aes() to create this many to many mapping.

year <- function(x) as.POSIXlt(x)$year + 1900
ggplot(economics, aes(unemploy / pop, uempmed)) + 
  geom_path(colour = "grey50") +
  geom_point(aes(colour = year(date)))

The layer geom_path has a mapping from each line created between points the same colour value indicated by the specification “grey50”. The syntax does not require the use of the aes() function. It is a many to one mapping.

We can get a more sophisticated visualization by using animation to explain the flow of time for how the two variables change simultaneously. In the following plot, the values of unemployment rate and median unemployment length in weeks are displayed for every year. By pressing the PLAY button, one sees the points for each year over the line trajectory, from beginning to end. One can use the slider to visualize the position of the variables for any given year.

library(plotly)
year <- function(x) as.POSIXlt(x)$year + 1900
p <- ggplot(economics, aes(unemploy / pop, uempmed)) + 
  geom_path(colour = "grey75") +
  geom_point(aes(colour = year(date), frame = year(date)))
Warning in geom_point(aes(colour = year(date), frame = year(date))): Ignoring
unknown aesthetics: frame
fig <- ggplotly(p)

fig <- fig %>% 
  animation_opts(1000, 
                 easing = "elastic", 
                 redraw = FALSE )

fig <- fig %>% 
  animation_button(x = 1, 
                   xanchor = "right",
                   y = 0, 
                   yanchor = "bottom")

fig <- fig %>%
  animation_slider(
    currentvalue = list(prefix = "YEAR ",
                        font = list(color="red")))
fig

After observing where the data lies between 2009 and 2015, there is no doubt that at any value of the unemployment rate, the median unemployment length in weeks has increased over and above any value of the previous 35 years in the USA according to this dataset.